OpenAI has been unhappy with some of Nvidia’s latest AI chips and has been looking for alternatives since last year, Reuters reported, citing eight sources.

The criticism is not directed at chips used for training AI models, a segment where Nvidia remains dominant. Instead, it concerns chips used for inference — the process by which a trained model generates responses to user queries. Seven sources said OpenAI is dissatisfied with the speed at which Nvidia’s hardware produces responses. The issue is particularly acute in applications such as software development with Codex, where latency is critical. OpenAI is said to need new hardware to cover roughly 10% of its future inference demand.

Inference requires different chip architectures

Employees reportedly attributed some of the performance issues to Nvidia hardware. Inference workloads involve far more memory access than training. Nvidia GPUs rely on external memory, which can slow processing. As a result, OpenAI is seeking chips that use SRAM embedded directly on the silicon, offering significant speed advantages.

According to Reuters, OpenAI held talks with startups such as Cerebras and Groq. Cerebras rejected an acquisition offer from Nvidia and instead struck a deal with OpenAI. CEO Sam Altman confirmed in late January that the Cerebras agreement was intended to meet the speed requirements of coding-focused models.

Discussions with Groq took a different turn: in December, Nvidia signed a $20 billion licensing deal with the startup, effectively ending OpenAI’s negotiations. Nvidia also hired Groq’s chip designers. Meanwhile, Nvidia has introduced the Rubin CPX, a specialized accelerator designed specifically for the prefill phase of AI inference.

$100 billion investment faces delays

In September, Nvidia announced plans to invest up to $100 billion in OpenAI. The deal was initially expected to close within weeks, but negotiations have now dragged on for months. One source said changes to OpenAI’s product roadmap have slowed the talks.

Nvidia CEO Jensen Huang dismissed reports of tensions as “nonsense” on Saturday, saying the company still intends to invest tens of billions of dollars. An OpenAI spokesperson added that the company continues to rely on Nvidia for the majority of its inference infrastructure.

Conclusion

The situation underscores a broader shift in the AI industry: as models scale and real-time applications grow, inference performance and latency are becoming just as critical as training power.